Recognizing Semi-Automatically Author’s Intentions From Scientific Documents
نویسندگان
چکیده
The growing volume of electronic documents on the Web, make their retrieval difficult. One of the reasons of this difficulty is the lack of rich representation of their structures. Intentional Structures promise to be a new paradigm to extend the existing documents structures and to enhance the different phases of documents process such as creation, editing, search and retrieval. The objective of this work is to propose a model of intentional structure and, an analyzer to recognize the author’s intentions from written documents in a specific domain. On the one hand, this system is based on ontology of intentions. On the other hand, it is based on a set of algorithms which facilitate the building of intentional structure. The main principle of these algorithms is to reproduce writer’s skills. The role of the system is to recognize the intentional structure i.e., to make a segmentation in a semi-automatic way of a document according to the authors intentions, and to extract the intentional verbs accompanied by their concepts of each segment through the algorithms of our analyzer. This analyzer is also able to update the ontology of intentions for the enrichment of the knowledge base containing all possible intentions of a domain. This article presents experimentation on scientific publications in the field of computer science. Key-words: Information Research, Analyzer, Intentional Structure, Segmentation, Ontology.
منابع مشابه
Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents
Recognizing polarity requires a list of polar words and phrases. For the purpose of building such lexicon automatically, a lot of studies have investigated (semi-) unsupervised method of learning polarity of words and phrases. In this paper, we explore to use structural clues that can extract polar sentences from Japanese HTML documents, and build lexicon from the extracted polar sentences. The...
متن کاملText analysis and knowledge mining system
Large text databases potentially contain a great wealth of knowledge. However, text represents factual information (and information about the author’s communicative intentions) in a complex, rich, and opaque manner. Consequently, unlike numerical and fixed field data, it cannot be analyzed by standard statistical data mining methods. Relying on human analysis results in either huge workloads or...
متن کاملMathML - aware article conversion from L A TEX . A comparison study
Publishing in Mathematics and theoretical areas in Computer Science and Physics has been predominantly using TEX/LTEX as a formatting language in the last two decades. This large corpus of borndigital material is both a boon — LTEX is semi-semantic format where the source often contains indications of the author’s intentions — and a problem — TEX is Turing-complete and authors use this freedom ...
متن کاملOntology-Based Word Sense Disambiguation for Scientific Literature
Scientific documents often adopt a well-defined vocabulary and avoid the use of ambiguous terms. However, as soon as documents from different research sub-communities are considered in combination, many scientific terms become ambiguous as the same term can refer to different concepts from different sub-communities. The ability to correctly identify the right sense of a given term can considera...
متن کاملTarragon Consulting at TREC 2017
Tarragon Consulting Corporation (henceforth Tarragon) contributed two runs to the new Common Core track. Both were manual runs using the NIST judged topics. Both used Solr as the base search engine with the queries semi-automatically constructed from the Topic descriptions and augmented with information from Wordnet and Wikipedia. Results are generally below the published median scores but for ...
متن کامل